{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Python 机器学习实战 ——代码样例\n",
    "\n",
    "# 第十五章 朴素贝叶斯分类器"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用朴素贝叶斯进行二分类\n",
    "\n",
    "本节我们将通过一个例子演示使用朴素贝叶斯分类器进行简单的分类。数据集依然使用上一节的逻辑回归中的乳腺癌数据集，一个经典的二分类数据集。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy_score: 0.941489361702\n",
      "混淆矩阵:\n",
      " [[ 61   6]\n",
      " [  5 116]]\n",
      "分类报告:\n",
      "              precision    recall  f1-score   support\n",
      "\n",
      "          0       0.92      0.91      0.92        67\n",
      "          1       0.95      0.96      0.95       121\n",
      "\n",
      "avg / total       0.94      0.94      0.94       188\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# 首先，导入需要的库。\n",
    "\n",
    "from sklearn.datasets import load_breast_cancer\n",
    "from sklearn.cross_validation import train_test_split  \n",
    "from sklearn.naive_bayes import GaussianNB\n",
    "from sklearn.metrics import classification_report\n",
    "from sklearn.metrics import confusion_matrix\n",
    "from sklearn.metrics import accuracy_score\n",
    "\n",
    "# 准备数据集。\n",
    "\n",
    "data = load_breast_cancer()\n",
    "label_names = data['target_names']\n",
    "labels = data['target']\n",
    "feature_names = data['feature_names']\n",
    "features = data['data']\n",
    "\n",
    "# 分离训练集和验证集。\n",
    "\n",
    "train, test, train_labels, test_labels = train_test_split(features,labels,test_size=0.33,random_state=42) \n",
    "\n",
    "# 创建朴素贝叶斯分类器，并拟合数据集。\n",
    "\n",
    "gnb = GaussianNB()\n",
    "model = gnb.fit(train, train_labels)\n",
    "\n",
    "# 在验证集上进行预测，并输出 accuracy score，混淆矩阵和分类报告。\n",
    "\n",
    "preds = gnb.predict(test) \n",
    "print('accuracy_score:',accuracy_score(test_labels, preds))\n",
    "print('混淆矩阵:\\n',confusion_matrix(test_labels, preds))\n",
    "print('分类报告:\\n',classification_report(test_labels, preds))\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
